Efficient Hold-Out for Subset of Regressors

نویسندگان

  • Tapio Pahikkala
  • Hanna Suominen
  • Jorma Boberg
  • Tapio Salakoski
چکیده

Hold-out and cross-validation are among the most useful methods for model selection and performance assessment of machine learning algorithms. In this paper, we present a computationally efficient algorithm for calculating the hold-out performance for sparse regularized least-squares (RLS) in case the method is already trained with the whole training set. The computational complexity of performing the hold-out is O(|H |3 + |H |2n), where |H | is the size of the hold-out set and n is the number of basis vectors. The algorithm can thus be used to calculate various types of cross-validation estimates effectively. For example, when m is the number of training examples, the complexities of N-fold and leave-one-out cross-validations are O(m/N + (mn)/N) and O(mn), respectively. Further, since sparse RLS can be trained in O(mn) time for several regularization parameter values in parallel, the fast hold-out algorithm enables efficient selection of the optimal parameter value.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vector Autoregressive Model Selection: Gross Domestic Product and Europe Oil Prices Data Modelling

 We consider the problem of model selection in vector autoregressive model with Normal innovation. Tests such as Vuong's and Cox's tests are provided for order and model selection, i.e. for selecting the order and a suitable subset of regressors, in vector autoregressive model. We propose a test as a modified log-likelihood ratio test for selecting subsets of regressors. The Europe oil prices, ...

متن کامل

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Sparse regression is the problem of selecting a parsimonious subset of all available regressors for an efficient prediction of a target variable. We consider a general setting in which both the target and regressors may be multivariate. The regressors are selected by a forward selection procedure that extends the Least Angle Regression algorithm. Instead of the common practice of estimating eac...

متن کامل

It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals

Multi-task prediction methods are widely used to couple regressors or classification models by sharing information across related tasks. We propose a multi-task Gaussian process approach for modeling both the relatedness between regressors and the task correlations in the residuals, in order to more accurately identify true sharing between regressors. The resulting Gaussian model has a covarian...

متن کامل

Determinant Efficiencies in Ill-Conditioned Models

The canonical correlations between subsets of OLS estimators are identified with design linkage parameters between their regressors. Known collinearity indices are extended to encompass angles between each regressor vector and remaining vectors. One such angle quantifies the collinearity of regressors with the intercept, of concern in the corruption of all estimates due to ill-conditioning. Mat...

متن کامل

Well-dispersed subsets of non-dominated solutions for MOMILP ‎problem

This paper uses the weighted L$_1-$norm to propose an algorithm for finding a well-dispersed subset of non-dominated solutions of multiple objective mixed integer linear programming problem. When all variables are integer it finds the whole set of efficient solutions. In each iteration of the proposed method only a mixed integer linear programming problem is solved and its optimal solutions gen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009